50/50 Expressional Odds of Retention Signifies the Distinction between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana
نویسندگان
چکیده
Intron retention, one of the most prevalent alternative splicing events in plants, can lead to introns retained in mature mRNAs. However, in comparison with constitutively spliced introns (CSIs), the relevantly distinguishable features for retained introns (RIs) are still poorly understood. This work proposes a computational pipeline to discover novel RIs from multiple next-generation RNA sequencing (RNA-Seq) datasets of Arabidopsis thaliana. Using this pipeline, we detected 3,472 novel RIs from 18 RNA-Seq datasets and re-confirmed 1,384 RIs which are currently annotated in the TAIR10 database. We also use the expression of intron-containing isoforms as a new feature in addition to the conventional features. Based on these features, RIs are highly distinguishable from CSIs by machine learning methods, especially when the expressional odds of retention (i.e., the expression ratio of the RI-containing isoforms relative to the isoforms without RIs for the same gene) reaches to or larger than 50/50. In this case, the RIs and CSIs can be clearly separated by the Random Forest with an outstanding performance of 0.95 on AUC (the area under a receiver operating characteristics curve). The closely related characteristics to the RIs include the low strength of splice sites, high similarity with the flanking exon sequences, low occurrence percentage of YTRAY near the acceptor site, existence of putative intronic splicing silencers (ISSs, i.e., AG/GA-rich motifs) and intronic splicing enhancers (ISEs, i.e., TTTT-containing motifs), and enrichment of Serine/Arginine-Rich (SR) proteins and heterogeneous nuclear ribonucleoparticle proteins (hnRNPs).
منابع مشابه
Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine
One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of r...
متن کاملA single-nucleotide exon found in Arabidopsis.
The presence of introns in gene-coding regions is one of the most mysterious evolutionary inventions in eukaryotic organisms. It has been proposed that, although sequences involved in intron recognition and splicing are mainly located in introns, exonic sequences also contribute to intron splicing. The smallest constitutively spliced exon known so far has 6 nucleotides, and the smallest alterna...
متن کاملApplying genetic programming to the prediction of alternative mRNA splice variants.
Genetic programming (GP) can be used to classify a given gene sequence as either constitutively or alternatively spliced. We describe the principles of GP and apply it to a well-defined data set of alternatively spliced genes. A feature matrix of sequence properties, such as nucleotide composition or exon length, was passed to the GP system "Discipulus." To test its performance we concentrated ...
متن کاملNascent-seq indicates widespread cotranscriptional pre-mRNA splicing in Drosophila.
To determine the prevalence of cotranscriptional splicing in Drosophila, we sequenced nascent RNA transcripts from Drosophila S2 cells as well as from Drosophila heads. Eighty-seven percent of the introns assayed manifest >50% cotranscriptional splicing. The remaining 13% are cotranscriptionally spliced poorly or slowly, with ∼3% being almost completely retained in nascent pre-mRNA. Although in...
متن کاملThe RAD52-like protein ODB1 is required for the efficient excision of two mitochondrial introns spliced via first-step hydrolysis
Transcript splicing in plant mitochondria involves numerous nucleus-encoded factors, most of which are of eukaryotic origin. Some of these belong to protein families initially characterised to perform unrelated functions. The RAD52-like ODB1 protein has been reported to have roles in homologous recombination-dependent DNA repair in the nuclear and mitochondrial compartments in Arabidopsis thali...
متن کامل